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IMPROVED ANALYSES OE THE RANDOMIZED POWER METHOD 
AND BLOCK LANCZOS METHOD 

SHUSEN WANG*, ZHIHUA ZHANGt, AND TONG ZHANG* 

Abstract. The power method and block Lanczos method are popular numerical algorithms for 
computing the truncated singular value decomposition (SVD) and eigenvalue decomposition prob¬ 
lems. Especially in the literature of randomized numerical linear algebra, the power method is widely 
applied to improve the quality of randomized sketching, and relative-error bounds have been well 
established. Recently, Musco Sz Musco (2015) proposed a block Krylov subspace method that fully 
exploits the intermediate results of the power iteration to accelerate convergence. They showed 
spectral gap-independent bounds which are stronger than the power method by order-of-magnitude. 
This paper offers novel error analysis techniques and significantly improves the bounds of both the 
randomized power method and the block Lanczos method. This paper also establishes the first 
gap-independent bound for the warm-start block Lanczos method. 

Key words, random projection, block Lanczos, power method, Chebyshev polynomials, Krylov 
subspace 
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1. Introduction. Efficiently computing the truncated eigenvalue decomposition 
and singular value decomposition (SVD) is one of the most important topics in nu¬ 
merical linear algebra. The power and block Lanczos methods are the most popular 
approaches to such tasks, and spectral gap-dependent bounds [1, 14, 8, 19] or gap- 
independent bounds [5, 18, 4, 12] have been established. 

This work is closely related to the recent advancements in the randomized nu¬ 
merical linear algebra (NLA) society [2, 5, 10, 18]. The seminal work [5] proposed 
to efficiently and approximately compute the truncated SVD by random projection. 
Specifically, let M € be any matrix, k (<C m^n) be the target rank, X € 

(p > k) be the standard Gaussian matrix, and C = MX G R”^xp a sketch of 
M. Let fc(M) G rank k projection of M on the column space of C 

(defined in Section 2). When p = fc/e, the matrix (.{IsA) approximates M with 
(1 + e) Frobenius norm relative-error bound. Unfortunately, the (1 -I- e) relative-error 
spectral norm bound, which is more useful than the Frobenius norm bound, cannot 
hold due to the D(^) lower bound [17]. 

To improve the approximation quality, Halko et al. [5] proposed to run the power 
method (specifically, the simultaneous subspace iteration) multiple times. Specifically, 
after obtaining the sketch C[o] = MX G one can run the power iteration t times 

to obtain 

C[t] = (MM'^)*C[o] = (MM^)*MX G 

The sketch matrix C[t] is obviously better than C[o]: when t = 0(^^^f-^), the sketch 
matrix C[j] admits (1 -fe) spectral norm relative-error bound [2, 4]. In this paper, by 
different proof techniques, we improves the (1 -I- bound to 
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To accelerate convergence, one can keep the intermediate results—the small-scale 
sketches Cjq], • • • , C[^]—to form the Krylov matrix 

K = [C[o],--- ,C[d]] = [MX, (MM^)MX, ••• , (MM^)^MX] gM™x(d+i)p. 

The approximation theory of Chebyshev polynomials [15] tells us that the Krylov 
subspace /C = range(K) almost contains the top k principal components of M. Musco 
& Musco [12] recently showed that when d = using K as a sketch of M, the 

truncated SVD of M can be solved with (1 -|- e) spectral norm relative-error bound. 
The fact that d is much smaller than t indicates much fewer power iterations. 

Notice that as d grows large, the blocks in K are getting increasingly linearly 
dependent, which leads to numerical instability. Therefore, the classical work in NLA 
has many treatments to prevent this from happening, e.g. reorthogonalization or par¬ 
tial reorthogonalization. This is beyond the scope of this paper; the readers can refer 
to [3, Chapter 4] to see the discussions of Lanczos procedures without reorthogonal¬ 
ization. In this paper we consider only the convergence issues without going to the 
details of numerically stable implementations. In addition, in finding low-precision 
approximations, the term d is small and the columns of K are unlikely nearly de¬ 
pendent, and thus the numerical stability is not a big issue. In finding low-precision 
solutions, Musco & Musco [12] naively used K as the sketch without performing ad¬ 
ditional numerical treatments to each block, yet their experiments still demonstrates 
good performance. 

It is worth mentioning that there is subtle difference between the work in the 
randomized NLA society and that in the traditional NLA society. As for the clas¬ 
sical Krylov methods [14, 8], the block size p is usually a small integer or even one. 
Differently, in this paper, as well as in [12], we make p greater than or equal to the 
target rank k. This difference is due to different motivations. The classical block 
Lanczos methods set p small so as to find a few eigen-pairs (or singular value triples) 
at a time. In this work and [12], the block size p is in accordance with the scale of 
the simultaneous subspace iteration, and the Krylov matrix K serves as a sketch of 
the data matrix—it is indeed the record of the output of every simultaneous subspace 
iteration. 

We will discuss the effects of block size in Section 6.3. We find that a big block size 
leads to large per-iteration cost but small number of iterations. In fact, a big block 
size does not necessarily leads to large total time or memory cost. More importantly, 
in big data application where data does not fit in memory, a pass through the data 
is expensive due to the I/O costs of memory-disk swaps. In such applications, our 
setting of big block size has advantage over the classical ones for its pass-efficiency. 
However, in computing high precision SVD, the results in this paper and [12] may 
not be useful due to high computational costs and the numerical stability issues. 

This paper offers several novel and stronger results for simultaneous subspace 
iteration and the block Krylov subspace method: 

• For the block Lanczos method, we establish gap-independent,^ matrix norm 

bounds, which are worse than the best rank k approximation by . Notice 
that in [12] the error term is which is weaker than our result. 

• For the block Lanczos method, the convergence analysis of [12] unnecessarily 
makes d depend on logn. We eliminate the logn dependence to improve the 


^Gap-independent means that the convergence rate does not depend on the spectral gaps. 
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convergence analysis. In this way, we obtain the first gap-independent bound 
for the warm-start block Lanczos method. 

• We offers novel gap-iirdependent, matrix norm bound for the power method, 
which is stronger than the existing results [2, 4, 5, 18]. 

As a minor contribution, we establish gap-dependent bounds with random initial¬ 
ization analysis. Though the power method and the Lanczos method are usually 
initialized by Gaussian matrices, to the best of our knowledge, similar iiritialization 
analysis is absent in the literature.^ 

The remainder of this paper is organized as follows. Section 2 defines the no¬ 
tation and introduces preliminaries of matrix algebra. Section 3 presents the main 
theorems—the improved gap-independent bounds of the power method and the block 
Lanczos method. Sections 4 and 5 prove the main theorems: Section 4 analyzes the 
error incurred by random initialization, and Section 5 establishes gap-independent 
convergence bounds. Section 6 establishes the gap-dependent bounds and discusses 
the effect of block size. Additional lemmas and proofs are deferred to the appendix. 

2. Preliminaries. In this section we define the notation and introduce the pre¬ 
liminaries in matrix algebra, polynomial functions, and matrix functions. 

2.1. Elementary Matrix Algebra. Let M be any m x n matrix and 

n 

(2.1) M = USV^ = UfeSfeV^ + U_feS_feVTfc = 

=Mfc =M_fc 


be the full singular value decomposition (SVD), where the diagonal entries of S are in 
the descending order, and Ufc (m x k), (fc x k), V*, (n x k) be the top k principal 
components. 

The principal angle between subspaces is defined in the the following. Let X G 
(p > k) have full column rank, Ufe G have orthonormal columns, U_fe G 

R"x("-fe) |30 orthogonal complement of Ufc, and Vk be the collection of rank k 
orthogonal projectors. Following [ 6 ], we define the principal angle between range(Ufe) 
and range(X) by 


tan 6 >fe(Ufe, X) 


min max 

P^Vk ||w ||2 —1 

Pw=W 


l|U^fcXw||2 

IlUi’Xwlb ■ 


Obviously, range(Ufc) C range(X) if tan0fc(Ufc, X) = 0. When rank(X) = k, we use 
9 to replace 9k. 

We define the projection operations in the following. Let C G ]R"*xc jj^^ye rank 
p. The matrix CC^ is an orthogonal projector, and CC^M projects M on range(C). 
We let Vp be the collection of rank p orthogonal projectors (with proper size). The 
matrix — CC^ is a rank m — p orthogonal projector. We define the projection 
operations by 

JM) = C argmin ||M-CY||^ 

rank(Y)</c 

for fc < p and ^ = 2 or E. Notice that ^.(•) and Vc ki') different in general, 
and they are not orthogonal projectors unless k = rank(C). The following inequality 


^Our random initialization analysis resembles [5, 6], but our result is stronger and more useful. 
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holds due to the optimality of the orthogonal projector CCl [4]: 

(2.2) ||M-CC^M||^ < ||M-'P^_^(M)||^. 

Let Qc G be any orthonormal bases of C. It was shown in [18] that 

(2.3) ^c.hM) = Qc(QSM)fc. 

However, unless k = rank(C), closed form expression of the spectral norm case 
fe(M) is yet unknown. 

2.2. Chebyshev Polynomial and Matrix Functions. Functions of SPSD 
matrices are defined according to the eigenvalue decomposition. Let /(•) be any 
function, A G be any SPSD matrix, and A = UAU^ = X)r=i be the 

eigenvalue decomposition. Then 


/(A) = U/(A)U^ = ^/(A0u,uf. 

i=l 


To analyze the Krylov subspace methods, we consider a function 4'i^) defined 
on K_|_. Let Tii{x) be the d degree Chebyshev polynomial [15]. Musco & Musco [12] 
defined the d degree polynomial 


p{x) 


(1 + 7)0 


Td{x/a) 

Td{l + l) 


for any parameters a > 0 and 7 G (0,1]. Accordingly, we define the function (/)(•): 


(j){x) 


p{x) if a; > 0 ; 

0 if a; < 0 . 


The properties of function are listed in the following lemma. 

Lemma 2.1. Suppose we specify a value a > 0, a gap 7 G (0,1], and a degree 
d > 1. The function satisfies that 

(i) m = 0; 

(ii) (p{x) > 0 when x > a; 

(iii) (fix) > X for all a; > (1 + 7 ) 0 ; 

(iv) |(/)(a;)| < for all x G [0,a]; 

(v) For any SPSD matrix A G let K = [X, AX, A^X, • • • ,A''X]. Then 

range(^(A)X) C range(K). 

3. Main Results. This section presents the main results. Section 3.1 estab¬ 
lishes the gap-independent, matrix norm bounds of the block Lanczos and the power 
method. Section 3.2 compares with the previous work. 

3.1. Gap Independent Bound. Theorem 3.1 establishes gap-independent bounds 
of the block Lanczos method. The random-start bound is stronger than [12] because 
o'p+i < CTfc+i for p > k. To the best of our knowledge, gap-independent bound of 
warm-start block Lanczos is unknown before this paper. 

Theorem 3.1 (Block Lanczos). Let M G be any matrix, X G R™^p have 

full column rank, k (< p) be the target rank, and K = [X, (MM^)X, • • • , (MM^j'^X]. 




Randomized Power Method and Block Lanczos Method 


5 


Table 1 

The table compares the error bounds of the rank k approximations and the number of iterations 
to attain the bounds. Here p is the block size, and it holds that p> k. 


this work 

previous 

work 

^iterations 

error 

^iterations 

error 

power method 0(e~^logn) 

^ 1 z 

Cfe+i -1- eo-p+i 

0{e~^ log n) 

(1 -|- 

block Lanczos log n) 

^k+i + ce-p+i 

log n) 

(1 -1- 


• (Random Start) IfKisa standard Gaussian matrix and 


d = 




yn- p + ,/p + a \ ^ ^/ logn N 
e{y/p-yk-a) ) ^ yfe G 


then there exists an m x k matrix C with range(C) C range(K) such that 


||M-CCtM||2 < ||M-Mfc||2 + e||M-Mp||2 

holds with probability at least 1 — 2exp(—a^/2). 

• (Warm Start) //X satisfies tan0fc(Ufc,X) < /3 and 


2 + log 2 , 

then there exists an m x k matrix C with range(C) C range(K) such that 
||M-CC^M||2 < (l + e)||M-Mfe||2. 

Proof. The theorem follows from Theorem 4.1 (the analysis of random initializa¬ 
tion) and Theorem 5.1 (the analysis of convergence). □ 

By nearly the same proof (see Section 5.2), we can show gap-independent bound 
of the power method. The bound is stronger than the existing results because Op+i < 
(Tfc+i for p > k. The analysis of warm-started power method can be shown in the 
same way, so here we do not elaborate. 

Theorem 3.2 (Power Method). Let M G matrix and X G 

have full column rank. If 'K. is a standard Gaussian matrix and 

i = 

then there exist an mx k (k < p) matrix C with range(C) C range((MM^)‘X) such 
that 



||M-CCtM||2 < ||M-Mfc||2 + e||M-Mp||2 
holds with high probability. 

3.2. Comparisons to the Existing Gap-Independent Bounds. This sub¬ 
section compares our work with the existing results. The error bounds and number 
of iterations are summarized in Table 1. 

The seminal work of Halko, Martinsson, & Tropp [5] showed a gap-independent, 
spectral norm bound of Gaussian random projection and its power iteration extension. 









6 


Randomized Power Method and Block Lanczos Method 


Later on, Boutsidis et al. [2] made improvements and showed that with Gaussian 
random initialization X £ (p > k) and t = 0{e~^ \ogn) power iterations, the 

(1 + e) relative-error spectral norm bound holds with high probability (w.h.p.): 

\\M-rh^,{M)\\l < (l + e)||M-Mfe||", 

where C = M(M^M)‘X £ is the output of the power iteration. In practice, 

people usually set p several times greater than k. Unreasonably, the benefit of setting 
p greater than k is not reflected in such an error bound. This work improves the error 
bound to 

||M-ip2 < IlM-Mfcll^ + ellM-Mpll^. 

It is worth mentioning that the previous work [2, 5, 18] makes use of the inequality 
||PM ||2 < ||P(MM^)*M|| 2 ^^^*+^\ where P is an arbitrary orthogonal projector, to 
show the convergence of the power method. Our result is obtained by very different 
techniques. 

Discarding the intermediate outputs of the power iterations is actually a waste 
of computation. The very recent work of Musco & Musco [12] showed that with only 
d = logn) power iterations, the column space of 

K = [MX, M(M^M)X, • • • , M(M^M)'^X] 

almost contains the top k principal components of M. Specifically, 

||M-P^,fc(M)||2 < (l + e)||M-Mfe||2 

holds w.h.p. Our analysis delivers a stronger result; with the same d, the inequality 
||M-P^_fc(M)||2 < ||M-Mfe||2 + e||M-Mp||2 

holds w.h.p. 

Furthermore, in the analysis of [12], the term log n arises from both of the random 
initialization and convergence analysis. Though the logn term is inevitable in the 
analysis of random initialization, it should be avoided in the convergence analysis. 
Otherwise, warm start and random start simply have the same bound, which is not 
in accordance with people’s empirical experience [11, 9]. We eliminate the logn term 
in the convergence bound (see Theorem 3.1) and obtain the first gap-independent 
warm-start bound. 


3.3. Gap-Dependent Bounds. For the sake of completeness, we establish 
spectral gap-dependent bounds. We show that to attain the -1- ecrp_|_]^ matrix 
norm bound, the power method and the block Lanczos method respectively need 


t = 

d = 


log 

° ' tiy/p-Vk-a) ' 
2l0g((Tfc/(Tp+i) 


^ ^ / log(n/e) \ 
Vlog(crfc/crp+i)y ’ 


1 _L Ins- ( V^+VP+a \ 

^ + ^Og 2 ( ) 

Y^(crfc/CTp+i )2 - 1 


V \/ / ^p+1 1 


iterations, where a is a parameter that controls the failure probability. Li & Zhang 
(2013) [8] established similar convergence analysis, but their analysis lacks initializa¬ 
tion analysis. We formally show our results in Section 6. 
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4. Analysis of Random Initialization. We propose the Gaussian random 
matrix X £ as the initial guess and bound the principal angle between range(X) 

and any k {< p < n) dimensional subspace range(Ufc). It is worth mentioning that 
Theorem 4.1 is more interesting than the bound on tan0fc(Ufc,X), which has been 
studied in previous work [5, 4, 6 ]. In fact, the matrix XZ in Theorem 4.1 has special 
structure which can make the convergence of power iteration and block Lanczos faster. 

Theorem 4.1. Let U = [Ufe,U_fe] = [ui,--- ,u„] £ be an orthogonal 

matrix, where Ufe and U_fe are n x k and n x {n — k) matrices. Let X he an n x p 
matrix with full column rank and C = U^X £ We partition the rows of C by 



pxk px{p—k) px{n—p) 


If k < p, then 

• There exists a pxk matrix Z with orthonormal columns such that range(Z) C 


null(C 2 ) Equivalently, 

1 J * * * 7 


• Let U_p = 


XZ = 0 for i = k + 1, ■ ■ ■ ,p. 

£ R"X(n-p)_ 


{n—k)xk 


^{p—k)xk 

UTpXZ 


and that ||U;?:j,XZZ^w ||2 = ||U!:pXZZ^w ||2 for any w £ W. 

• Further assume that X is a standard Gaussian matrix. Then the principal 
angle between range(Ufc) and range(XZ) satisfies 


tan 6 l(Ufc,XZ) < 


^n- p + y/p + a 
y/p - Vk - a 


with probability at least 1 — 2exp(—a^/2). 

Proof. (1) Since C 2 is a {p—k)xp matrix, we have nullity(C 2 ) = p—rank(C 2 ) > k. 
So there exist matrices Z £ with orthonormal columns such that range(Z) C 

null(C 2 ). Since C = U^X, it is easy to see that C 2 Z = 0 is equivalent to ufXZ = 0 
for i = k 1, - ■ ■ ,p. 

(2) The second property follows directly from that ufXZ = Oixfe for z = /c + 

I,--- 

(3) If X is a standard Gaussian matrix, then C = U^X is also a standard Gaus¬ 
sian matrix, and so is any submatrix of C. Since w = ZZ’^w, it follows from the sec¬ 
ond property that ||U^j,Xw ||2 = ||U^^XZZ’^w ||2 = ||U^pXZZ^w ||2 = ||U!^^pXw|| 2 . 
Therefore, 


tan 6 l(Ufc,XZZ^) 


l|U^feXw ||2 
max ——Tr -r:— 

l|w ||2 = l ||U^Xw ||2 
ZZ^w—w 


iiu!:pXw|i2 

max ——--— 

||w|| 2^1 ||U^Xw ||2 

ZZ^w=w 


IIC3WII2 IIC3II2 ||w ||2 ^ IIC3II2 

IH| 2 ^i ||Ciw ||2 “ IHI 2 =i CTfc(Ci)||w ||2 crfc(Ci)’ 

ZZ^w=w ZZ^w=w 


where w* is the optimizer. Since C 3 £ R^" p^^p is standard Gaussian, it follows from 
[16, Gorollary 5.35] that 


IIC3II2 < y/n- p + y/p + a 
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holds with probability at least 1—exp(—a^/2). Since Ci £ is standard Gaussian, 
it follows from [16, Corollary 5.35] that 

o-fc(Ci) > y^-Vk-a 

holds with probability at least 1 — exp(—a^/2). Finally, using the union bound, we 
obtain 


tan6»(Ufe,XZZ^) < 


IIC3II2 

^fc(Ci) 


yjn-p + y/p + a 

^ - Vk - a 


with probability at least 1 — 2exp(—a^/2). □ 

5. Gap-Independent Convergence Bounds. Section 5.1 and 5.2 establish 
gap-independent convergence bounds for the block Lanczos method and the power 
method, respectively. The only difference in their proofs is the use of different matrix 
functions. 

5.1. The Block Lanczos Method. Theorem 5.1 establishes gap-independent 
spectral norm bound for the block Lanczos method. The theorem demonstrates 0{-^) 
convergence rate, where d is the number of power iterations. 

Theorem 5.1. Let M £ SVD defined by (2.1), and 

X £ have full column rank and satisfy rank(U|’X) = k, (k < p). Let cf be the 

function (parameterized by d) defined in Section 2.2. 

• Let 


11 tan6»fc(Ufe,X) 

d=-r + -r -■ 

W W e 

Then there exists anmxk matrix C satisfying i'a,iage{C) C range((()(MM’^)X) 
and 

IlM-CC^MlI^ < {l + 2e)al+^. 

• Let Z £ have orthonormal columns and satisfy ra.ngefX.Z) C null([u|(|_j^, 

■ ■ ■ Let 


11 tan6>(Ufc,XZ) 

d = ^ + ^ log2-. 

W W e 

Then the m x k matrix C = (/)(MM^)XZ satisfies 

||M-CC^M ||2 < max I (1-H 2e)cr2+i, -becTp+i I < + 2eal+^. 

Proof. We will set a to be either or so the inequality (t(.> a always 

holds. In addition, the matrix — CC^ is an orthogonal projector, and the function 
(j) satisfies the requirements in Section B. Thus we can apply Lemma B.2 to show that 

||(l„-CCt) Mil' < ||(l„-CCt) 0(MfeM^)||2+max{a2+i, (l + 7)a}. 

It remains to bound the term || (l^ —CC^)(^(MfeM|’) by Lemma C.l or Lemma C.3. 
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(1) We let n be the rank k orthogonal projector H = argminpgp^ tan0(Ufc, XP) 
and Zi € have orthonormal columns and satisfy ZiZ^ = 11. Let C = 

(/)(MM^)XZi € Lemma C.l shows that 

||(I™ - CC^)^(MkMl)\\^ < tan0fc(Ufc, X) ||0(M_feM^fc)||2. 

We set a = 7 = e, and d = log 2 (c/e) + then it follows from 

Lemma 2.1 that 




2dV7-l 


^fc +1 ^ f 2 

2dvi-i - c 


for i = k + 1, ■ ■ ■ ,n. Thus 

lk(M-feM!;fc)||2 =max||()!)(CTfe+i)|, ••• , |(/)(cr2)|| < 
We conclude that 

||(I„, - CC^mMkMDW^ < tan0fe(Ufc, X) = 


where the equality follows by setting c = tan0fc(Ufc, X). 

(2) We let Z 2 € have orthonormal columns and satisfy range(XZ 2 ) C 

([u^+i; • • • ; Let C = (/)(MM^)XZ 2 € Lemma C.3 shows that 

||(I„-CCt)<^(MfcM^)||2 < tan0(Ufc,XZ2) ||(^(M_pM!:p)||2. 

We set a = 7 = e, and d = log 2 (c/e) + then it follows from 

Lemma 2.1 that 




a 


^P+i 

2d%/i-l 



for i = p + 1, • • • ,n. We conclude that 


||(I„, - CCt)</.(MfcM ^)||2 < tan0(Ufc, XZ 2 ) = eal+„ 

where the equality follows by setting c = tan0(Ufc, XZ 2 ). □ 

5.2. The Po-wer Method. We define the t order monomial function 

r/" \ /'I I \ {x! odj 

/(*) = (1 + 7)«(Y^ 


where a > 0 and 7 G (0,1]. It is easy to show that (i) /(O) = 0, (ii) f{x) > 0 when 
a; > 0, (ii) f{x) > x when x > (1 + 7 ) 0 , and (iv) f[x) < when x < a. Then 

the proof is almost the same to that of the block Lanczos method. 

Theorem 5.2. Let M G matrix with SVD defined by (2.1), and 

X G R™^P have full column rank and satisfy rank(UfcX) = k, (k < p). Let Z G R^’^^ 
have orthonormal columns and satisfy range(XZ) C • • • ,uj]^). Let 
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Then the m x k matrix c = (MM^yXZ satisfies 

||M-CCtM||^ < al^,+2eal+,. 


Proof. We will set a to be either cTp+i, so the inequality > a always holds. 
In addition, the function / satisfies the requirements in Section B. So we can apply 
Lemma B.2 to show that for any positive integer k (< m,n) and any orthogonal 
projector T, the following holds: 

IITMII 2 < ||T/(MfeM^)||2+max{cr^+i, (l+yja}. 

It is obvious that range(/(MM^)X) = range((MM^)‘X). Letting T = 1^ — CC^ = 
I™ - (/(MM^)XZ)(/(MM'^)XZ)t and applying Lemma C.3, we obtain 

||(I„-CCt)M||J < tan0(Ufe,XZ)||/(M_pM^p)||2+max{a2+i, (I+y)^} 

= tan6»fc(Ufe,XZ) max {/(cr^+i), • • • ,/(cr^)} + max {cr^.+i, (l+yja} 

^ + ^ax (1 + e)tTp+;^} < + 2ecrp_|_;^, 

where the second inequality follow by setting a = and 7 = e and applying the 
fourth property of /. □ 

6 . Gap-Dependent Bounds. Section 6.1 provides gap-dependent principal an¬ 
gle bounds. Section 6.2 shows how to convert the principal angle bounds to the matrix 
norm bounds. In this section we denote the SVD of M € by (2.1). 

6.1. Gap-Dependent, Principal Angle Bound. This subsection studies the 
power iteration and the block Lanczos method and establishes gap-dependent, prin¬ 
cipal angle bounds. 

Theorem 6.1. Let M be any m x n matrix, X he an n x p standard Gaussian 
matrix, and K = [X, (M'^M)X, • • • , (M^M)^X]. When 

/ Vfi^P±PP±a\ 

^ ^ ^ f log(n/e) \ 

21 og(crfc/crp+i) \\og{ak/(Tp+i))' 

^ 1 + log2 ( ^ ^ / log(R/e) \ 

\/{crkl(yp+iy - 1 \\/crk/(Jp+i - 1/ ’ 

the power iteration and the block Lanczos respectively satisfy 

tan6»fc(Vfe, (M^M)‘X) < e and tan6lfc(Vfc, K) < e 

with probability at least 1 — 2exp(—a^/2). 

Proof. The initial error can be bounded by Theorem 4.1. Since A = M^M € 
jg gpsp)^ gg^jj apply Theorem 6.5 (the power method) and Theorem 6.6 
(the block Lanczos method) to analyze the convergence. The theorem follows from 
Lemma 2.1 that range((()(M^M)X) C range(K). 0 
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6.2. From Principal Angle Bound to Matrix Norm Bound. We establish 
Lemma 6.2 to help us convert principal angle bound to spectral/Frobenius norm 
bounds. Then we obtain gap-dependent matrix norm bound in 6.3. 

Lemma 6.2. Let M G Qfiy matrix, W G (p> k) have full column 

rank and satisfy rank(V^W) = k, and ^ = 2 or F. Then there exists an mx k matrix 
C satisfying range(C) C range(MW) and 

||M-CC^M||" < (l + tan2 0fc(Vfe, W)) ||M-Mfc||2. 

Let Z G have orthonormal columns and satisfy range(WZ) C ■ ■ ■ , . 

Let C = MWZ G Then 

||M-CCtM||" < ||M-Mfc|||-ftan2 0(Vfe, WZ) ||M-Mp|||. 


Proof. The result follows directly from Corollary C.2 and Corollary C.4. □ 

By applying Lemma 6.2, the principal angle bound in Theorem 6.1 can be con¬ 
verted to the following matrix norm bounds. The theorem is stronger than its coun¬ 
terpart in [12]. It is because cannot exceed when p > k. 

Theorem 6.3. Let M, X, K be defined in Theorem 6.1, and let 


t = 


log (- 


i—piVn—p+^+a) 


V^iVP-Vk-a) ) ^ log(n/£) \ 

21og(crfc/crp+i) Vlog(crfc/crp+i)/’ 


d = 


i—pi\/n.—p+^+a) 

\/e(v^-Vfe-«) 


y^(crfe/crp+i)2 - 1 


/ log(n/e) \ 
\\/crk/<Jp+i - 1 / 


Let Q be the orthonormal bases of either M(M^M)‘X G or MK G 

Then it holds with probability at least 1 — 2exp(—a^/2) that 




Proof. Let W = (M^M)*X G in the power iteration case, and let W = 

(/)(M'^M)X G in the block Lanczos case where (j) is the function defined in 

Lemma 2.1. Applying Lemma 6.2, Theorem 4.1 (random initialization). Theorem 6.5 
(convergence of power iteration). Theorem 6.6 (convergence of block Lanczos), and 
that range((()(M^M)X) C range(K) by Lemma 2.1, we can show that 


|M-P4,.(M)f. < ||M-MJ" 


= M-Mi 


M- M, 


n — p' 
e 


■p\\f 


n — p ^ 

r i=p+l 


cr? < 


M-M. 


ecr; 


p-l-i' 


Then the Frobenius norm bound follows by the definition that Pq ^.(M) = Q(Q^M)fe. 
Then the spectral norm bound follows from the Frobenius norm bound and [4, The¬ 
orem 3.4]. □ 
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Table 2 

Comparisons of the block Lanczos method with different block sizes. Here di, and dp are defined 
in Section 6.3. It holds that dp < df,. However, there is no inequality relationship between bdi, and 

pdp. 


Block Size 

Time 

Space 

#Passes 

b{> 1) 

0{n"‘bdb) 

n X bdb 

db (6.1) 

p (> k) 

0{n?pdp) 

n X pdp 

dp (6.2) 


6.3. Comparisons to the Previous Work. The traditional block Lanczos 
methods usually set the block size to be small integer which is usually smaller than 
the target rank k. In this work and [12], the block size is set greater than the target 
rank k. We discuss in Section 1 that our setting of big block size is due to different 
motivations from the traditional work. In this subsection we further discuss the pros 
and cons of big block size by comparing with the convergence bound of Li & Zhang 
(2013) [8]. 

Li & Zhang’s Bound. Li & Zhang [8] established improved convergence bound 
of the block Lanczos method. Li & Zhang claimed that their result is stronger than 
Saad’s bound [13]. Let b be the block size^ and K = [X, AX, • • • , e u^xHdb+i) 

be the Krylov matrix. The result of Li & Zhang is complicated, so we set Ai = • • • = 
and A„ = 0 to simplify their result. When 


( 6 . 1 ) 


db = O 


fclog[Afc_i/(Afc_i - Afc)] + log(l/e) 

\/ (Afc — Afc+b)/Afc+6 


it holds that 


tan0i(ui,K) < etan0(ui,x) for i = l,---,k. 


This Work. To attain the guarantee 


tan6lfe(Ufe,K) < etan6lfe(Ufe,X), 


we show that the block Lanczos needs 


( 6 . 2 ) 


dp — 


O 


log(l/£) \ 

Ap-|-i)/Ap-i-i / 


iterations, where p can be set to be any integer greater than or equal to k. In fact, a 
similar (but not the same) result can be derived from Li & Zhang’s work [8, Theorem 
4.1] by setting i = 2, I = k, and nt = p. However, Li & Zhang did not provide random 
initialization analysis, which is no easier than the convergence analysis. 

Comparisons. We compare between the methods in Table 2 and discuss the 
effects of block size in the following. 

• Effects of the spectral gaps and the error tolerance. When 


log - < b log 
e 


Afc-i 

Afc-i — Afc ’ 


®For our method, we let p be the block size. We require p be greater than or equal to the target 
rank k. In the classical work, e.g. [13, 8], b can be less than k. 

^This is the worst case for Li & Zhang’s bound. 
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it holds that pdp < bd^.^ Thus either when the spectral gaps are small or 
when a low-precision solution suffices, big block size has advantage. 

• Time and memory costs. The per-iteration time costs are 0{n?b) (block 
size b) and 0(ji?p) (block size p), and thus the total time cost are respectively 
0{n^bdb) and 0{n^pdp). The dimensionality of the Krylov subspaces are 
respectively n x bdb (block size b) and n x pdp (block size p). When the 
spectral gaps are small or e is not too small, large block size (p > k) saves 
time and memory. 

• Pass Efficiency. Pass efficiency is also an important pursuit because every 
pass incurs many cache-RAM swaps or even RAM-disk swaps. In practice, 
the time cost of the cache-RAM swaps is not negligible compared to the CPU 
time. When A does not fit in RAM, the RAM-disk swaps are tremendously 
more expensive than the CPU time. The number of passes are respectively db 
(block size b) and dp (block size p) passes through the data. Not surprisingly, 
bigger block size always improves pass efficiency. 

We conclude that big block size is a good choice when (1) the spectral gaps are small, 
(2) a low-precision solution suffices, or (3) a pass through the data is expensive. 
However, in computing high precision solution, the results in this paper and [12] may 
not be useful because of high computational costs and numerical stability issues. 

6.4. Proof of the Theorems. This subsection analyzes the convergence of 
principal angles and establishes linear convergence rates. Here we only consider SPSD 
matrix A, because both MM^ and M^M are SPSD. We denote the eigenvalue de¬ 
composition of A by 


A = UAU^ = UfeAfcUi’ + U_feA_feU^,. 

^ ^ ^ ^ ^ 
Afc A_fc 


k n 

4 - X^Uiuf, 


i—1 



where A^’s are in the descending order. 

Key lemma. We first establish the following lemma, which will be used in 
showing the convergence of the power method and the block Lanczos method. 

Lemma 6.4. Let A be an n x n SPSD matrix, "K. be an n x p matrix with 
full column rank, and Z G (k < p) have orthonormal columns and satisfy 

range(XZ) C null([u^_|_j^, ••• ,uj]^). Assume that /(A^) € is nonsingular. 

Then for any function /(•) we have 


tan0fc(Ufc,/(A)X) < tan0(Ufc,/(A)XZ 


< 


max{|/(Ap+i)|,--- ,|/(A„)|} 
min{|/(Ai)|,--- ,|/(Afe)|} 


tan6 l(Ufc,XZ 


^When 6+fc = p+1, the denominators of (6.1) and (6.2) are the same. So we can set b = p-\-l — k 
to facilitate the comparison. 
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Proof. Theorem 4.1 shows the existence of the p x k matrix Z with orthonormal 
columns such that uf XZ = 0 for i = fc + 1, • • • ,p. We bound tan0(Ufc, /(A)XZ) by 


tan6»(Ufe, /(A)XZZ'^) 


||U;J(A)Xw||2 

IMb=i l|Ui’/(A)Xw ||2 

ZZ^w=w 


= max 
||w|| 2 -l 

ZZ^w=w 


W,Ufe/(Afe)U|’Xw + U!:,U_fc/(A_fc)W,Xw||2 

||Uf Ufe/(Afe)Uf Xw + U^U_fc/(A_fc)W^Xw ||2 


||/(A_fe)WfcXw ||2 

IHI2=i ||/(Afc)U^Xw||2 

ZZ^w=w 


where the last equality follows from that \J^XJ-k = 0. The assumption ufXZ = 0 
for i = fc + 1, • • • ,p indicates that 

||/(A_fe)U^,Xw ||2 = ||/(A_p)U^pXw ||2 


We then obtain 


tan0 


(Ufe,/(A)XZ) = 


||/(A_p)U!:pXw ||2 


“|t=i ||/(Afc)U|’Xw ||2 

ZZ^w=w 

||/(A,)|l 2 ||U^,Xw ||2 


- ||[/(Afc)]-i|l-iU|’Xw ||2 

ZZ^ w=w 


= ||/(A_p)|U|[/(A,)] 


-1 I 


max 

||w|| 2 - 

ZZ^w=w 


l|WfeXw ||2 


"2 iiwT=i iiuI’Xwii 


max{|/(Ap+i)|,--- ,|/(A„)|} 

- . f . tan6l(Ufc,XZ). 


Here the inequality follows by that /(Afc) is nonsingular and that 

IIU^Xwll^ = ||[/(Afe)]-V(Afe)UrXw||2 < ||[/(Afe)]-i||J|/(Afc)U^Xw||2. 

and the second equality follows by that ||U^pXZZ’^w ||2 = ||U^j,XZZ^w|| 2 . Finally, 
the theorem follows by 


tan0fc(Ufe, /(A)X) 


min max 

PePfe ||w||2—1 

Pw=W 


||U!:fe/(A)Xw ||2 

||U^/(A)Xw||2 


^ ||U;J(A)Xw ||2 

“ ||U^/(A)Xw||2 

ZZ^w=w 


tan0(Ufe, /(A)XZ). 


□ 

Convergence of the power method. This subsection analyzes the convergence 
of the power iteration and shows that the principal angle between range(Ufe) and 
range(A*X) converges exponentially in t. The convergence is fast when the spectral 
gap is big. 

Theorem 6.5. Let A be an n x n SPSD matrix, X be an n x p matrix with 
full column rank, and Z £ (k < p) have orthonormal columns and satisfy 
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range(XZ) C null([u|’^_j, ••• ,Up]^). Assume that Xk > 0. After t = iog(\k}\+i) 
iterations, the principal angle satisfies 

tan 6 »fe(Ufc, A*X) < tan 6 »(Ufe, A*XZ) < e tan 6 l(Ufc,XZ). 


Proof. The assumption Xk > 0 indicates that A^, € is nonsingular. We set 

f(y) = y* s-nd apply Lemma 6.4 to show that 


tan^ 


(Ufe, 


A*XZ = 


max{|A^+i|,--- ,|A^|} 
mm{|A‘|,--- ,|A‘.|} 


tan6»(Ufc,Xz) 




tan 0 


(Ufc,xz). 


Then the theorem follows by the setting of t. □ 

Convergence of the block Lanczos method. This subsection analyzes the 
convergence of the block Lanczos method. We show in Theorem 6.6 that the conver¬ 
gence depends on \/Xk/Xp+i — 1. When the spectral gap < e, we have that 


Xk/Xp+i — 1 > Y^log(Afc/Ap+i) > log(Afc/Ap+i), 

and the block Lanczos method thereby converges faster than the power iteration. 

Theorem 6 .6. Let A be an n x n SPSD matrix, X he an n x p matrix with full 
column rank, Z £ (k < p) have orthonormal columns and satisfy range(XZ) C 

null([u^^j^, • • • and (/){■) be the function (parameterized by specific a, 7 , and 

d) defined in Lemma 2.1. Assume that Xk > 0. When 

1 +l0g2(l/£) 

vW 

the following inequality holds: 

tan6lfc(Ufc, </)(A)x) < tan6»(Ufc, <^(A)Xz) < e tan0(Ufc,XZ). 


Proof We assume that Ap+i > 0; otherwise we can show tan 0(Ufc, 0(A)XZ) = 0, 
and the theorem holds trivially. We set /(•) to be </>(•) which is parameterized by 
a = Ap+i, 7 = — 1, d = , Since Ai > • • • > Afc > a, it follow from 

Lemma 2.1 that ^(Afc) is nonsingular. So we can apply Lemma 6.4 to show that 


tan^ 


(Ufe, </)(A)Xz) 


max{|(/)(Ap+i)|,--- ,|(^(An)|} 
min{|^(Ai)|,--- ,|(/>(Afc)|} 


tan6l(Ufc,Xz). 


We apply Lemma 2.1 to obtain 


max{|^(Ap+i)|,--- ,|(/)(A„)|} < = £^p+i> 

min{|^(Ai)|,--- ,|(/)(Afc)|} > Xk. 


Thus 


tan 6 »(Ufc, (/)(aA)Xz) 


< e 


'^p-H 

Xk 


tan 6 »(Ufe,XZ) 


< etan 6 l(Ufc,XZ). 


Finally the theorem follows by the setting of d. 0 
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7. Conclusions and Discussions. We have studied the power method (the 
simultaneous subspace iteration) and the block Lanczos method and established sev¬ 
eral error bounds stronger than those in the literature. Firstly, power method has 
been widely used to refine the sketch obtained by random projection or column selec¬ 
tion. We have shown that with power iterations, the power method attains 

o’fe+i + co'p+i spectral norm bound, which is stronger than the (1 -I- bound 

in the literature. Secondly, through recording the output of every power iteration 
to form a larger-scale sketch, only power iterations are required to attain 

the CTfe+i + ecrp_|_i spectral norm bound. Our result is stronger than the (1 -I- e)crl^i 
bound of Musco & Musco (2015). Thirdly, we have established the first spectral gap- 
independent bound for the warm-started block Lanczos method. Finally, we have 
also shown gap-dependent bound for the randomized power iteration and the block 
Lanczos method, which demonstrate linear convergence rate and relatively weak de¬ 
pendence on the spectral gaps. Though the convergence analysis of the gap-dependent 
bound is similar to the existing work, our initialization analysis is new and more useful 
than the previous work. 

Appendix A. Proof of Lemma 2.1. 

It was shown in [12, 15] that when y > 1, the Chebyshev polynomial can be 
expressed by 

Tdiy) = ^ (y + - i) + ^(y- Vy^ - i) • 

Therefore, Td{l -I- 7 ) > 0 always holds for 7 > 0, 

We now prove the lemma. The first property follows from the definition of (j). 
When X > a, we have Tii{x/a) > 0, and thus (j){x) > 0. Then the second property 
follows. Lemma 5 of [12] shows that p{x) > x for all a; > (1 - 1 - 7 ) 0 , and thus the third 
property holds. 

We show the fourth property in the following. Assume that a; > 0; otherwise 
(j){x) = (j){0) = 0, and the fourth property holds trivially. Let Tii{x) be the d degree 
Chebyshev polynomial. By the definition of and p(-), we have that for a; > 0, 

(A.I) |i.M| = |p(x)| = + + 

where the inequality follows from that x/a < 1 , Td{l -|- 7 ) > 0 , and Td{y) G [— 1 , 1 ] 
for all ?/ G [—1,1]. Lemma 5 of [12] shows that 

Td(l+7) > 2'^'^ > 0. 

It follows from 7 < 1 that 

1 -f 7 2 

\4>{x)\ < a 2 ^^ — ’ 

by which the fourth property follows. 

We prove the last property in the following. Since p{-) is a polynomial, we can 
express p{x) by 


p{x) = Co+ CiX + C2X^ -\ - -I- CdX'^, 
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where Cq,--- ,Cd are the coefficients. We let A = eigenvalue 

decomposition. Let p = rank(A). Then Ai > • • • > Ap > 0 and Ap+i = • • • = A„ = 0. 
The matrix function 4’{A.) can be written as 

n p p d 

(/-(A) = 51 ] = ^(/)(Ai)uiuf = 

i—1 i—1 i—1 j—Q 

dp d n d 

j=0 i—1 i=l j—0 

Here the second equality follows from that (j){Xp+i) = ■ ■ ■ = (/)(A„) = 0. We can see 
that (/)(A)X = J2j=o Cj-A.lX can be written as the linear combinations of the (column) 
blocks of K. Thus range(^(A)X) C range(K). 

Appendix B. Analysis of Matrix Function. 

Let / be any function parameterized by a > 0 and 7 G (0,1]. The function 
/ satisfies that (i) /(O) = 0, (ii) f{x) > 0 when x > a, and (iii) f{x) > x when 
a; > (1 + 7 ) 0 . The following lemmas hold for such a function. 

Lemma B.l. Let M be an m x n matrix and M = USV^ be its full SVD. 
Assume that 

CTi > CT 2 > • • • CTfc > (1 + 7)0 

and that r > k and cr^ > a. Then for any orthogonal projector P, the following 
inegualities hold: 

||PMfc||" < ||P/(MfcMr)P||2 < ||P/(M,M^)P||2. 

Proof Differently from the definitions elsewhere, here we define Sj, € ]^y 

setting the (fc + l)-th to the last singular values to zero. Thus = USfcV^. We 
similarly define e and 

The third property of / indicates that af < /(erf) for all i < k, and therefore 
^ /(SfcSf’). Since r > fc, /(O) = 0, and /(crf+i),--- ,/(crf) > 0, we have 
/(SfcSf') ^ /(SrSf’). We then apply [7, Theorem 7.7.2] to show that 

PUSfcSf’u'^P^ ^ PU/(SfcSf’)U^P^ ^ PU/(S^Sf’)U^P^ 

for any matrix P. It is equivalent to 

PMfeMfP ^ P/(MfcMf')P ^ P/(M,Mf')P. 

Corollary 7.7.4 of [7] shows that for * = 1, • • • , n, the eigenvalues satisfy 

A,(PMfcMf’P) < A,(P/(MfeM^)P) < A,(P/(M,Mf’)P). 

It follows that 

llPMfeMf'PlI^ < \\Pf{MkMl)P\\^ < ||P/(M,Mf')P||2. 

The lemma follow froms that jjPMfej|| = jjPMfcMf’Pj] 2 . □ 

Lemma B.2. Let M € R™^*^ be any matrix. Assume that af > a. Then for any 
positive integer k (< m,n) and any orthogonal projector T, the following holds: 

IITMII 2 < ||T/(MfcM^)|| 2 + max{crf+i, ( 1 + 7 ) 0 }. 
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Proof. Let k = cardinality (5) where S is the index set 
5 = {i e [fc] I af > (1 +7)a}. 

Since k > k, a? > (1 + 7 )a, and > a, it follows from Lemma B.l that 

||TM^,||2 < ||T/(MfcMi’)T||2 < \\Tf{MkMl)\\^. 

We then apply matrix Pythagorean and obtain 

||TM||^<||TM^||^ + ||TM_^||^ < \\T f{M,Ml)\\^+al^^. 

In the following we consider two cases: (1) cr? > (1 + 7 ) 0 , and (2) af < 

K~\~ 1. rC”|”X 

(1 + 7 ) 0 . 

In the former case k = k must hold for the following reasons. By the definition, 
k cannot exceed k. If fe < /c, we have that fe + I is in 5 (by the definition of 5). 
However, this will make the cardinality of S greater than k —contradiction. In this 
case we conclude that 


\\TM\\l<\\Tf{MkMl)\\^ + al+,. 

In the latter case, we directly obtain 

||TM||2< ||T/(MfcM^)||2 + (l+7)a. 

Then the lemma follows from the above two inequalities. □ 

Appendix C. Analysis of the Matrix Norm Error. 

In this subsection we analyze the matrix norm errors and establish Lemma C.I 
and Lemma C.3. 

C.I. Matrix Norm Error Bounds. Lemma C.I and Corollary C.2 are useful 
in analyzing the matrix norm error error and help converting principal angle bounds 
to matrix norm bounds. 

Lemma C.I. Let M S be any matrix. We decompose M fey M = Mi +M 2 

such that Ml Ml’ = 0 and rank(Mi) = k. Let the right singular vectors 0 /Mi and 
M 2 he respectively Vi € and V 2 G R"x("-'=) (obviously V|’V 2 = 0). Let 

X G R"^P be any matrix such that rank(V|’X) = k. Then for the p x p rank k 
orthogonal projector 


n = argmintan0(Vi, XP), 

P6-Pfc 


the following inequality holds: 

||Mi - (MXn)(MXn)^Mi||^ < tan6»fc(Vi,X) ||M 2 


Proof. Assume that tan0fc(Vi,X) < 00 , otherwise the lemma holds trivially. 
Obviously 11 satisfies rank(Xn) = k, otherwise tan0fe(Vi,X) := tan0(Vi,Xn) = 
00 , which violates the assumption. 
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Let Z G have orthonormal columns and satisfy ZZ^ = 11. The above 

assumption ensures rank(XZ) = rank(XZZ^) = rank(Xn) = k. Let XZ ~ QR be 
the QR decomposition, where Q and R are respectively n x k and k x k matrices. 
That rank(XZ) = k implies R is nonsingular. It follows that MXZ = (MQ)R and 
(MXZ)R^i = MQ, which respectively indicate range(MXZ) C range(MQ) and 
range(MQ) C range(MXZ), and therefore 

range(MXZ) = range(MQ). 


It follows that 

||Mi - (MXZ)(MXZ)^Mi||^ = ||Mi - (MQ)(MQ)tMi||^ 

(C.l) < ||Mi-P^q,,.(Mi)||^ < ||M2 Q(VfQ)t||^" 

< ||M2||^||V2VjQ(VfQ)t||^. 

Here the first inequality follows (2.2), the second inequality follows from Lemma C.5, 
and the third inequality follows from that M 2 = M 2 V 2 VI’. We bound the term 
l|V 2 V|’Q(V?’Q)t||i by 

||V2VjQ(VfQ)t||^ < ||V2VjQ||2||(VfQ)t||^ 

= ||(I„-ViVf)Q||2afe2(vfQ) = tan20(Vi,Q) = tan^ 0(Vi, XZ). 

Here the first equality follows from that V 2 is the orthogonal complement of Vi, 
the second equality follows from the definition of tan0 and that Q has orthonormal 
columns, and the last equality follows from that range(XZ) = range(Q). We thus 
obtain 


||Mi - (MXZ)(MXZ)^Mi||^ < tan26i(Vi,XZ) ||M2||^. 

The theorem follows from that 11 = ZZ^ and that tan0fc(Vi,X) = tan0(Vi,Xn). 

□ 

Corollary C.2. Under the notation of Lemma C.l, we have that 

||M-(MXn)(MXn)^M||^ < (^1+tan2 6»fc(Vi,X)) ||M 2 ||j- 


Proof. It follows from Mi + M 2 = M, MiMl’ = 0, and the matrix Pythagorean 

that 


M - (Mxn)(Mxn)^M ^ - (Mxn)(Mxn)^) (Mi + M2) 

- (Mxn) (Mxn)^) j “ (Mxn) (Mxn)^) Ms 

(i™ - (Mxn)(Mxn)t)Mi|p + ||M2||^. 


< 

< 


2 

5 


Then the corollary follows directly from Lemma C.l. 0 
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C.2. Stronger Matrix Norm Error Bounds. Under stronger assumptions, 
Lemma C.l can be further strengthened. 

Lemma C.3. Let M and X be defined in Lemma C.l and p and k (k < p < n) 
be positive integers. We deeompose M by 

M = Ml + M 2 + Mg = UiSiVf +U 2 S 2 V^+U 3 S 3 V|’, 

where Vi G V 2 G R"^(P“^)^ V 3 G are orthogonal to each other. Let 

Z G have orthonormal columns and satisfy range(XZ) C null(V^). Then 

||Mi - (MXZ)(MXZ)^Mi||^ < tan2 6 »(Vfe,XZ) IIM 3 III. 

Proof. Theorem 4.1 guarantees the existence of Z. Let XZ = QR be the QR 
decomposition of XZ. Based on the same argument in the proof of Lemma C.l, we 
have range(MXZ) = range(MQ), and thus (C.l) also holds: 

||Mi - (MXZ)(MXZ)tMi||^ < ||(M 2 +M 3 )Q(VfQ)t||^. 

By the definition of Z we have 

M 2 Q = (U2S2V^)(XZR-1) = U2S20R-1 = 0, 

and thus 

||Mi - (MXZ)(MXZ)^Mi||^ < ||M 3 Q(VfQ)^||^ 

= ||M3V3VjQ(VfQ)t||^ < ||M3||^||V3V|’Q(VfQ)t||J 
= ||M3||"||(V2V^+V3V^)Q(VfQ)t||^ 

= ||M3||"||(I„-ViVf)Q(VfQ)t||J, 

where the second equality follows from that V^Q = V|’XZR~^ = 0 , and the last 
equality follows from that [V 2 , V 3 ] is the orthogonal complement of Vi. Using the 
same argument as the proof of Lemma C.l, we have 

||(I„-ViVf)Q(VfQ)t||^ < tan2 0(Vi,Q) = tan^ 0(Vi, XZ), 

by which the lemma follows. □ 

Corollary C.4. Under the notation of Lemma C.3, we have that 
||M-(MXZ)(MXZ)^M||^ < IIM 2 TMall^+tan2 6 »(Vi,XZ) IlMgll^. 

Proof. It follows from Mi +M 2 + M 3 = M, Mi (M 2 UMg)^ = 0, and the matrix 
Pythagorean that 

||m - (MXZ)(MXZ)tMj|^ 

< I - (MXZ)(MXZ)t) (Ml + Ma + M3) ||^ 

< ||(l^ - (MXZ)(MXZ)t)Mij|^ + ||(^I„ - (MXZ)(MXZ)+)(M2 + M3)||^ 

< I - (MXZ)(MXZ)t)Mi||^ + jjMa + Majl^. 

Then the corollary follows directly from Lemma C.l. 0 
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C.3. Extension of [2, Lemma 3.1]. This subsection offers a variant of [2, 
Lemma 3.1], and it is used in the proof of Lemma C.l and Lemma C.3. 

Lemma C.5. Let M € any matrix. We decompose M M = M 1 + M 2 

such that rank(Mi) = k. Let the right singular vectors 0 /Mi fee Vi G Let 

X G fee any matrix such that rank(V^X) = k and let C = MX G Then 

||Mi-P«_,(Mi)||^" < ||M2X(VfX)t||^. 

Proof. The proof mirrors Lemma 3.1 of [2]. Let f = 2 01 F and Mi = UiSiV^. 
We expand the error term by 

IlMi-T’^^.(Mi)||^ = min ||Mi-CY||^ < IlMi - C(VfX)^Vi 11^ 

’ ^ rank(Y)<fc ^ ^ 

= ||Mi-(Mi+M2)X(VfX)tVi||^ 

= ||Mi - UiSi^(y^^V^^Vi - M2X(VfX)tVi||^ 

kxp pxk 

= ||Mi - UiSiVf - M2X(VfX)tVi||^ 

= ||M2X(VfX)tVi||^. 

Here the fourth equality follows from that (VfX)(VfX)l = 1^ when k < p and 
rank(VfX) = fe. □ 
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